Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors

نویسندگان

  • Wanqiu Kou
  • Fang Li
  • Timothy Baldwin
چکیده

The native representation of LDA-style topics is a multinomial distributions over words, but automatic labelling of such topics has been shown to help readers interpret the topics better. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and propose automatic and human evaluations of our method. First, we use a chunk parser to generate candidate labels, then map topics and candidate labels to word vectors and letter trigram vectors in order to find which candidate label are more semantically related with that topic. A label can be found by calculating the similarity between a topic and its candidate label vectors. Experiments on three data sets show that the method is effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kou, Wanqiu, Li Fang and Timothy Baldwin (to appear) Automatic Labelling of Topic Models using Word Vectors and Letter Trigram Vectors, in Proceedings of the Eleventh Asian Information Retrieval Societies Conference (AIRS 2015), Brisbane, Australia

The native representation of LDA-style topics is a multinomial distributions over words, which can be time-consuming to interpret directly. As an alternative representation, automatic labelling has been shown to help readers interpret the topics more efficiently. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and ...

متن کامل

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing tokenlevel semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous den...

متن کامل

A Two Level Model for Context Sensitive Inference Rules

Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level. A recent line of work, which addresses context sensitivity of rules, represented contexts in a latent topic space and computed similarity over topic vectors. We propose a novel two-level model, which comp...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Improving Topic Coherence with Latent Feature Word Representations in MAP Estimation for Topic Modeling

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature word vectors have been used to obtain high performance in many natural language processing (NLP) tasks. In this paper, we present a new approach by incorporating word vectors to directly optimize the maximum a posteriori (MAP) estimation in a topic model. Preliminary results show t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015